Provide optimized writers for OpenTelemetry's "trace.proto" wire protocol#11120
Provide optimized writers for OpenTelemetry's "trace.proto" wire protocol#11120
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 63 metrics, 8 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1064621
Total [baseline] (11.045 s) : 0, 11045400
Agent [candidate] (1.057 s) : 0, 1057313
Total [candidate] (11.093 s) : 0, 11093352
section appsec
Agent [baseline] (1.256 s) : 0, 1256345
Total [baseline] (11.119 s) : 0, 11119130
Agent [candidate] (1.254 s) : 0, 1253832
Total [candidate] (11.042 s) : 0, 11042396
section iast
Agent [baseline] (1.224 s) : 0, 1224079
Total [baseline] (11.282 s) : 0, 11282237
Agent [candidate] (1.224 s) : 0, 1223578
Total [candidate] (11.288 s) : 0, 11287970
section profiling
Agent [baseline] (1.184 s) : 0, 1183662
Total [baseline] (11.009 s) : 0, 11009444
Agent [candidate] (1.192 s) : 0, 1192130
Total [candidate] (11.019 s) : 0, 11019491
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.242 ms) : 0, 1242
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (636.9 ms) : 0, 636900
BytebuddyAgent [candidate] (632.802 ms) : 0, 632802
AgentMeter [baseline] (29.795 ms) : 0, 29795
AgentMeter [candidate] (29.555 ms) : 0, 29555
GlobalTracer [baseline] (249.272 ms) : 0, 249272
GlobalTracer [candidate] (249.149 ms) : 0, 249149
AppSec [baseline] (32.527 ms) : 0, 32527
AppSec [candidate] (32.369 ms) : 0, 32369
Debugger [baseline] (59.969 ms) : 0, 59969
Debugger [candidate] (59.967 ms) : 0, 59967
Remote Config [baseline] (592.323 µs) : 0, 592
Remote Config [candidate] (596.144 µs) : 0, 596
Telemetry [baseline] (8.079 ms) : 0, 8079
Telemetry [candidate] (7.999 ms) : 0, 7999
Flare Poller [baseline] (10.008 ms) : 0, 10008
Flare Poller [candidate] (7.534 ms) : 0, 7534
section appsec
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.229 ms) : 0, 1229
BytebuddyAgent [baseline] (666.273 ms) : 0, 666273
BytebuddyAgent [candidate] (665.577 ms) : 0, 665577
AgentMeter [baseline] (12.359 ms) : 0, 12359
AgentMeter [candidate] (12.307 ms) : 0, 12307
GlobalTracer [baseline] (250.179 ms) : 0, 250179
GlobalTracer [candidate] (250.0 ms) : 0, 250000
IAST [baseline] (24.619 ms) : 0, 24619
IAST [candidate] (24.58 ms) : 0, 24580
AppSec [baseline] (186.263 ms) : 0, 186263
AppSec [candidate] (185.022 ms) : 0, 185022
Debugger [baseline] (66.401 ms) : 0, 66401
Debugger [candidate] (66.131 ms) : 0, 66131
Remote Config [baseline] (612.672 µs) : 0, 613
Remote Config [candidate] (604.0 µs) : 0, 604
Telemetry [baseline] (8.374 ms) : 0, 8374
Telemetry [candidate] (8.387 ms) : 0, 8387
Flare Poller [baseline] (3.5 ms) : 0, 3500
Flare Poller [candidate] (3.568 ms) : 0, 3568
section iast
crashtracking [baseline] (1.215 ms) : 0, 1215
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (798.599 ms) : 0, 798599
BytebuddyAgent [candidate] (799.605 ms) : 0, 799605
AgentMeter [baseline] (11.534 ms) : 0, 11534
AgentMeter [candidate] (11.539 ms) : 0, 11539
GlobalTracer [baseline] (239.904 ms) : 0, 239904
GlobalTracer [candidate] (239.167 ms) : 0, 239167
IAST [baseline] (25.991 ms) : 0, 25991
IAST [candidate] (25.752 ms) : 0, 25752
AppSec [baseline] (31.654 ms) : 0, 31654
AppSec [candidate] (31.909 ms) : 0, 31909
Debugger [baseline] (65.604 ms) : 0, 65604
Debugger [candidate] (64.55 ms) : 0, 64550
Remote Config [baseline] (536.577 µs) : 0, 537
Remote Config [candidate] (533.48 µs) : 0, 533
Telemetry [baseline] (9.341 ms) : 0, 9341
Telemetry [candidate] (9.645 ms) : 0, 9645
Flare Poller [baseline] (3.637 ms) : 0, 3637
Flare Poller [candidate] (3.557 ms) : 0, 3557
section profiling
crashtracking [baseline] (1.187 ms) : 0, 1187
crashtracking [candidate] (1.187 ms) : 0, 1187
BytebuddyAgent [baseline] (690.52 ms) : 0, 690520
BytebuddyAgent [candidate] (696.296 ms) : 0, 696296
AgentMeter [baseline] (9.171 ms) : 0, 9171
AgentMeter [candidate] (9.254 ms) : 0, 9254
GlobalTracer [baseline] (207.146 ms) : 0, 207146
GlobalTracer [candidate] (208.379 ms) : 0, 208379
AppSec [baseline] (32.816 ms) : 0, 32816
AppSec [candidate] (33.029 ms) : 0, 33029
Debugger [baseline] (65.794 ms) : 0, 65794
Debugger [candidate] (66.106 ms) : 0, 66106
Remote Config [baseline] (580.727 µs) : 0, 581
Remote Config [candidate] (572.888 µs) : 0, 573
Telemetry [baseline] (7.668 ms) : 0, 7668
Telemetry [candidate] (7.774 ms) : 0, 7774
Flare Poller [baseline] (3.508 ms) : 0, 3508
Flare Poller [candidate] (3.553 ms) : 0, 3553
ProfilingAgent [baseline] (94.138 ms) : 0, 94138
ProfilingAgent [candidate] (94.614 ms) : 0, 94614
Profiling [baseline] (94.705 ms) : 0, 94705
Profiling [candidate] (95.184 ms) : 0, 95184
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1050480
Total [baseline] (8.81 s) : 0, 8809977
Agent [candidate] (1.056 s) : 0, 1056271
Total [candidate] (8.826 s) : 0, 8825607
section iast
Agent [baseline] (1.233 s) : 0, 1232646
Total [baseline] (9.602 s) : 0, 9601623
Agent [candidate] (1.22 s) : 0, 1220004
Total [candidate] (9.565 s) : 0, 9564927
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.217 ms) : 0, 1217
BytebuddyAgent [baseline] (630.715 ms) : 0, 630715
BytebuddyAgent [candidate] (631.879 ms) : 0, 631879
AgentMeter [baseline] (29.486 ms) : 0, 29486
AgentMeter [candidate] (29.544 ms) : 0, 29544
GlobalTracer [baseline] (247.393 ms) : 0, 247393
GlobalTracer [candidate] (248.663 ms) : 0, 248663
AppSec [baseline] (32.296 ms) : 0, 32296
AppSec [candidate] (32.477 ms) : 0, 32477
Debugger [baseline] (58.805 ms) : 0, 58805
Debugger [candidate] (58.937 ms) : 0, 58937
Remote Config [baseline] (596.858 µs) : 0, 597
Remote Config [candidate] (589.703 µs) : 0, 590
Telemetry [baseline] (8.02 ms) : 0, 8020
Telemetry [candidate] (8.766 ms) : 0, 8766
Flare Poller [baseline] (5.886 ms) : 0, 5886
Flare Poller [candidate] (8.117 ms) : 0, 8117
section iast
crashtracking [baseline] (1.247 ms) : 0, 1247
crashtracking [candidate] (1.243 ms) : 0, 1243
BytebuddyAgent [baseline] (809.588 ms) : 0, 809588
BytebuddyAgent [candidate] (797.867 ms) : 0, 797867
AgentMeter [baseline] (11.57 ms) : 0, 11570
AgentMeter [candidate] (11.564 ms) : 0, 11564
GlobalTracer [baseline] (238.759 ms) : 0, 238759
GlobalTracer [candidate] (238.288 ms) : 0, 238288
AppSec [baseline] (31.285 ms) : 0, 31285
AppSec [candidate] (32.736 ms) : 0, 32736
Debugger [baseline] (64.475 ms) : 0, 64475
Debugger [candidate] (62.95 ms) : 0, 62950
Remote Config [baseline] (540.116 µs) : 0, 540
Remote Config [candidate] (539.597 µs) : 0, 540
Telemetry [baseline] (9.343 ms) : 0, 9343
Telemetry [candidate] (9.41 ms) : 0, 9410
Flare Poller [baseline] (3.637 ms) : 0, 3637
Flare Poller [candidate] (3.597 ms) : 0, 3597
IAST [baseline] (25.892 ms) : 0, 25892
IAST [candidate] (25.712 ms) : 0, 25712
LoadParameters
See matching parameters
SummaryFound 6 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 16 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.261 ms) : 1248, 1273
. : milestone, 1261,
iast (3.386 ms) : 3338, 3434
. : milestone, 3386,
iast_FULL (6.254 ms) : 6190, 6318
. : milestone, 6254,
iast_GLOBAL (3.61 ms) : 3551, 3670
. : milestone, 3610,
profiling (2.212 ms) : 2192, 2232
. : milestone, 2212,
tracing (1.955 ms) : 1938, 1972
. : milestone, 1955,
section candidate
no_agent (1.244 ms) : 1231, 1258
. : milestone, 1244,
iast (3.41 ms) : 3358, 3461
. : milestone, 3410,
iast_FULL (5.829 ms) : 5771, 5888
. : milestone, 5829,
iast_GLOBAL (3.645 ms) : 3590, 3701
. : milestone, 3645,
profiling (2.241 ms) : 2219, 2263
. : milestone, 2241,
tracing (1.925 ms) : 1909, 1941
. : milestone, 1925,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (17.95 ms) : 17770, 18129
. : milestone, 17950,
appsec (19.891 ms) : 19684, 20099
. : milestone, 19891,
code_origins (18.151 ms) : 17971, 18332
. : milestone, 18151,
iast (19.596 ms) : 19400, 19793
. : milestone, 19596,
profiling (19.314 ms) : 19117, 19510
. : milestone, 19314,
tracing (17.654 ms) : 17479, 17829
. : milestone, 17654,
section candidate
no_agent (17.38 ms) : 17201, 17559
. : milestone, 17380,
appsec (18.475 ms) : 18285, 18664
. : milestone, 18475,
code_origins (17.994 ms) : 17816, 18172
. : milestone, 17994,
iast (17.798 ms) : 17622, 17973
. : milestone, 17798,
profiling (18.969 ms) : 18776, 19162
. : milestone, 18969,
tracing (17.901 ms) : 17725, 18077
. : milestone, 17901,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.49 ms) : 1478, 1501
. : milestone, 1490,
appsec (3.835 ms) : 3611, 4060
. : milestone, 3835,
iast (2.285 ms) : 2215, 2355
. : milestone, 2285,
iast_GLOBAL (2.335 ms) : 2264, 2405
. : milestone, 2335,
profiling (2.112 ms) : 2056, 2167
. : milestone, 2112,
tracing (2.081 ms) : 2027, 2135
. : milestone, 2081,
section candidate
no_agent (1.492 ms) : 1480, 1503
. : milestone, 1492,
appsec (3.82 ms) : 3598, 4043
. : milestone, 3820,
iast (2.278 ms) : 2208, 2347
. : milestone, 2278,
iast_GLOBAL (2.324 ms) : 2254, 2394
. : milestone, 2324,
profiling (2.11 ms) : 2055, 2166
. : milestone, 2110,
tracing (2.085 ms) : 2031, 2139
. : milestone, 2085,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~97b5fc9e3d, baseline=1.62.0-SNAPSHOT~d5d2097cb9
dateFormat X
axisFormat %s
section baseline
no_agent (15.518 s) : 15518000, 15518000
. : milestone, 15518000,
appsec (14.806 s) : 14806000, 14806000
. : milestone, 14806000,
iast (18.361 s) : 18361000, 18361000
. : milestone, 18361000,
iast_GLOBAL (17.898 s) : 17898000, 17898000
. : milestone, 17898000,
profiling (14.997 s) : 14997000, 14997000
. : milestone, 14997000,
tracing (14.811 s) : 14811000, 14811000
. : milestone, 14811000,
section candidate
no_agent (14.996 s) : 14996000, 14996000
. : milestone, 14996000,
appsec (15.012 s) : 15012000, 15012000
. : milestone, 15012000,
iast (18.57 s) : 18570000, 18570000
. : milestone, 18570000,
iast_GLOBAL (18.251 s) : 18251000, 18251000
. : milestone, 18251000,
profiling (14.956 s) : 14956000, 14956000
. : milestone, 14956000,
tracing (15.191 s) : 15191000, 15191000
. : milestone, 15191000,
|
583dc0c to
4adb56e
Compare
| /** | ||
| * Collects trace spans and marshalls them into a chunked payload. | ||
| * | ||
| * <p>This payload is only valid for the calling thread until the next collection. | ||
| */ | ||
| @Override | ||
| public OtlpPayload collectSpans(List<DDSpan> spans) { |
There was a problem hiding this comment.
Is List<DDSpan> spans expected to be spans from a single trace? If so, each collectSpans call produces a full TracesData envelope with resource and scope wrappers per trace. This doesn't seem optimal and differs from the Datadog/msgpack implementation? Unless the expectation is that the eventual OtlpWriter will accumulated completed traces and call this once per flush cycle with a combined span list (although that can't be right based on the MetaWriter, which expects just a single trace at a time).
There was a problem hiding this comment.
Very good point - on reflection I'll change this to add a flush method so we can accumulate trace chunks over multiple calls.
There was a problem hiding this comment.
OK, I've updated the collector API so it has two methods:
addTrace(spans)which adds a trace to the collectorcollectTraces()which marshals the collected spans into a payload
This should allow its use as a replacement PayloadDispatcher, which means we can re-use more of the existing remote writer code.
…send them as first-class links (likewise turn off legacy baggage injection)
ab2ef0b to
7cdfed7
Compare
dougqh
left a comment
There was a problem hiding this comment.
Claude caught a couple issues...
- NPE and ClassCastException
Since I'm off next week, I'm not going to "request changes".
I'll just trust those get fixed and let someone else do the final review.
Also, added one key performance suggestion around use of forEach.
And here are couple more Claude reported that I'll leave to your discretion...
Config.get().getServiceName() on every span — OtlpTraceProto.java:137
if (!Config.get().getServiceName().equalsIgnoreCase(span.getServiceName())) {
Cache the default service name (ideally as a UTF8BytesString for cheap equality). This runs for every span in every payload.recordMessage allocates a fresh ByteBuffer + backing array per chunk — OtlpCommonProto.java:126-140
Every span, every link, every scope prefix gets its own heap allocation. Precisely-sized allocations are nice but total allocation count scales with the
chunk count. If profiling shows GC pressure, a small reusable scratch arena that hands out slices (or an OtlpPayload that owns a large backing buffer with
offset/length pairs) would eliminate most of this. Trade-off is lifetime complexity, so only worth it if measurements show it matters.
Yes, sadly this is the nature of heavily nested protobuf messages (the protobuf manual says to avoid too much nesting) It means that before we can write out a span we need to know its exact message size. And because the size field is written out with You could process traces twice - once to size everything, and again to write it out - but the book-keeping needed for that gets complicated, and you're doubling the CPU time doing two passes. Initial benchmarking showed we're allocating less than OTel with the current approach, mainly because we re-use the same buffer for doing the initial writes before recording each message slice. But I might look into pooling of slices to reduce churn. |
…takes an extra context object
97b5fc9 to
e77fe7e
Compare
What Does This Do
Uses a single temporary buffer as in #10983 to prepare message chunks at different nesting levels (resource / scope / span)
First we chunk all nested messages, i.e. span-links, for a given span. Once the span is complete we add the first part of the span message and its chunked links to the scoped chunks. Once the scope is complete we add the first part of the scoped spans message and all its chunks (span messages and their links) to the payload. Once all the span data has been chunked we add the enclosing resource metrics message to the start of the payload.
Multiple traces can be added to the collector before collecting them into a payload. Note that this payload is only valid for the calling thread until the next collection. Adding traces after collection automatically starts a new payload.
Motivation
Avoids need to use full protobuf library while keeping intermediate array creation to a minimum.
Additional Notes
OtlpTraceProtoTestwas created with the help of Claude.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.